Querying Hebrew Texts via Word Spotting

نویسندگان

  • Adiel Ben-Shalom
  • Adi Silberpfennig
  • Nachum Dershowitz
  • Lior Wolf
  • Yaacov Choueka
چکیده

We report on recent results with word-spotting (WS) in Hebrew historical texts, manuscript and printed. The advantage of such a retrieval system is that it works on images without any need for manual or computer transcription of the texts. The method allows for extremely rapid querying, while still maintaining high accuracy; thus, it should be considered as an important tool in historical textual research.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lexical Affect Sensing: Word Spotting Revisited

Recently, there has been considerable interest in the recognition of affect from texts. In this paper, we revisit the word spotting technique for affect sensing in short texts, for which purpose, we extract words from different affect dictionaries and explore the performance of various strategies for sensing affect.

متن کامل

Automatic Transliteration of Judeo-Arabic Texts into Arabic Script

! The Judeo-Arabic languages comprise a set of dialects spoken and written by Jewish communities living in Arab countries, mainly during the middle ages. Judeo-Arabic is typically written in Hebrew letters, enriched with various diacritic marks. The Judeo-Arabic spoken and written by any particular Jewish community is similar to the Arabic dialect used by their local Muslim community. In additi...

متن کامل

A survey of document image word spotting techniques

Vast collections of documents available in image format need to be indexed for information retrieval purposes. In this framework, word spotting is an alternative solution to optical character recognition (OCR), which is rather inefficient for recognizing text of degraded quality and unknown fonts usually appearing in printed text, or writing style variations in handwritten documents. Over the p...

متن کامل

Identifying translationese at the word and sub-word level

We use text classification to distinguish automatically between original and translated texts in Hebrew, a morphologically complex language. To this end, we design several linguistically informed feature sets that capture word-level and sub-word-level (in particular, morphological) properties of Hebrew. Such features are abstract enough to allow for the development of accurate, robust classifie...

متن کامل

A Morphological, Syntactic, and Semantic Search Engine for Hebrew Texts

This article describes the construction of a morphological, syntactic and semantic analyzer to operate a high-grade search engine for Hebrew texts. A good search engine must be complete and accurate. In Hebrew or Arabic script most of the vowels are not written, many particles are attached to the word without space, a double consonant is written with one letter, and some letters signify both vo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017